HADA: A Graph-Based Amalgamation Framework in Image-text Retrieval

نویسندگان

چکیده

Many models have been proposed for vision and language tasks, especially the image-text retrieval task. State-of-the-art (SOTA) in this challenge contain hundreds of millions parameters. They also were pretrained on large external datasets that proven to significantly improve overall performance. However, it is not easy propose a new model with novel architecture intensively train massive dataset many GPUs surpass SOTA already available use Internet. In paper, we compact graph-based framework named HADA, which can combine produce better result rather than starting from scratch. Firstly, created graph structure nodes features extracted edges connecting them. The was employed capture fuse information every model. Then neural network applied update connection between get representative embedding vector an image text. Finally, cosine similarity match images their relevant texts vice versa ensure low inference time. Our experiments show that, although HADA contained tiny number trainable parameters, could increase baseline performance by more $$3.6\%$$ terms evaluation metrics Flickr30k dataset. Additionally, did any only required single GPU due small parameters required. source code at https://github.com/m2man/HADA .

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

Graph-based Semi-Supervised Learning Framework for Medical Image Retrieval

As low level features can not reflect the high level semantic in medical image search, in this paper, we propose an image retrieval algorithm to combine visual concept and local features by graph-based semi-supervised learning framework. More specific, we construct a graph model by distance between images, and add density similarity measure in the label propagation progress to get the membershi...

متن کامل

A Radon-based Convolutional Neural Network for Medical Image Retrieval

Image classification and retrieval systems have gained more attention because of easier access to high-tech medical imaging. However, the lack of availability of large-scaled balanced labelled data in medicine is still a challenge. Simplicity, practicality, efficiency, and effectiveness are the main targets in medical domain. To achieve these goals, Radon transformation, which is a well-known t...

متن کامل

A graph-based image annotation framework

Automatic image annotation is crucial for keyword-based image retrieval because it can be used to improve the textual description of images. In this paper, we propose a unified framework for image annotation, which contains two kinds of learning processes and incorporates three kinds of relations among images and keywords. In addition, we propose some improvements on its components, i.e. a rein...

متن کامل

Image-Based Document Vectors for Text Retrieval

We propose a method for constructing a vector for a document image to represent its content to facilitate text retrieval. The method is based on an N-Gram algorithm for text similarity measure based on the frequency of occurrence of n-character strings appearing in the electronic text. Instead of using ASCII values, the present study investigates the use of character images to obtain the docume...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2023

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-28244-7_45